Search CORE

64 research outputs found

The Lie algebra cohomology of jets

Author: Kim Yunhyong
Publication venue
Publication date: 01/01/2002
Field of study

Let g be a finite-dimensional complex semi simple Lie algebra. We present a new calculation of the continuous cohomology of the Lie algebra z g[[z]]. In particular, we shall give an explicit formula for the Laplacian on the Lie algebra cochains, from which we can deduce that the cohomology in each dimension is a finite-dimensional representation of g which contains any irreducible representation of g at most once

arXiv.org e-Print Archive

CiteSeerX

Enlighten

MPG.PuRe

Implicit reference to citations: a study of astronomy

Author: Kim Yunhyong
Webber Bonnie
Publication venue
Publication date: 05/10/2006
Field of study

The research in this paper presents results in the automatic classification of pronouns within articles into those which refer to cited research and those which do not. It also discusses the automatic linking of pronouns which do refer to citations to their corresponding citations. The current study focused on the pronoun they as used in papers in Astronomy journals. The paper describes a classifier trained on maximum entropy principles using features defined by the distance to preceding citations and the category of verbs associated to the pronoun under consideration

Enlighten

Detecting Family Resemblance: Automated Genre Classification.

Author: Kim Dr Yunhyong
Ross Seamus
Publication venue
Publication date: 01/01/2006
Field of study

This paper presents results in automated genre classification of digital documents in PDF format. It describes genre classification as an important ingredient in contextualising scientific data and in retrieving targetted material for improving research. The current paper compares the role of visual layout, stylistic features and language model features in clustering documents and presents results in retrieving five selected genres (Scientific Article, Thesis, Periodicals, Business Report, and Form) from a pool of materials populated with documents of the nineteen most popular genres found in our experimental data set.

Crossref

Directory of Open Access Journals

Enlighten

Automating Metadata Extraction: Genre Classification

Author: Kim Dr Yunhyong
Ross Seamus
Publication venue
Publication date: 01/01/2006
Field of study

A problem that frequently arises in the management and integration of scientific data is the lack of context and semantics that would link data encoded in disparate ways. To bridge the discrepancy, it often helps to mine scientific texts to aid the understanding of the database. Mining relevant text can be significantly aided by the availability of descriptive and semantic metadata. The Digital Curation Centre (DCC) has undertaken research to automate the extraction of metadata from documents in PDF([22]). Documents may include scientific journal papers, lab notes or even emails. We suggest genre classification as a first step toward automating metadata extraction. The classification method will be built on looking at the documents from five directions; as an object of specific visual format, a layout of strings with characteristic grammar, an object with stylo-metric signatures, an object with meaning and purpose, and an object linked to previously classified objects and external sources. Some results of experiments in relation to the first two directions are described here; they are meant to be indicative of the promise underlying this multi-faceted approach.

Examining Variations of Prominent Features in Genre Classification.

Author: Kim Dr Yunhyong
Ross Seamus
Publication venue
Publication date: 01/06/2007
Field of study

This paper investigates the correlation between features of three types (visual, stylistic and topical types) and genre classes. The majority of previous studies in automated genre classification have created models based on an amalgamated representation of a document using a combination of features. In these models, the inseparable roles of different features make it difficult to determine a means of improving the classifier when it exhibits poor performance in detecting selected genres. In this paper we use classifiers independently modeled on three groups of features to examine six genre classes to show that the strongest features for making one classification is not necessarily the best features for carrying out another classification.

Crossref

Enlighten

Metadata and Other Stories Online: Is Metadata a Love Letter to the Future?

Author: Kim Yunhyong
Publication venue
Publication date: 03/12/2015
Field of study

No abstract available

Enlighten

Data, Information, and Knowledge: "where is the Life we have lost in living?"

Author: Kim Yunhyong
Publication venue: Dagstuhl Seminar Proceedings. 10291 - Automation in Digital Preservation
Publication date: 01/01/2010
Field of study

This abstract attempts to raise the question of whether current practices in digital preservation properly address the issues of findability of digital objects. It is also intended as a starting point for discussing preservation of digital information in contrast to digital data. The abstract is exploratory and informal

Dagstuhl Research Online Publication Server

Feature Type Analysis in Automated Genre Classification

Author: Kim Dr Yunhyong
Ross Seamus
Publication venue
Publication date: 01/01/2007
Field of study

In this paper, we compare classifiers based on language model, image, and stylistic features for automated genre classification. The majority of previous studies in genre classification have created models based on an amalgamated representation of a document using a multitude of features. In these models, the inseparable roles of different features make it difficult to determine a means of improving the classifier when it exhibits poor performance in detecting selected genres. By independently modeling and comparing classifiers based on features belonging to three types, describing visual, stylistic, and topical properties, we demonstrate that different genres have distinctive feature strengths.

Building a Document Genre Corpus: a Profile of the KRYS I Corpus

Author: Berninger Vera
Kim Yunhyong
Ross Seamus
Publication venue
Publication date: 18/10/2008
Field of study

This paper describes the KRYS I corpus (http://www.krys-corpus.eu/Info.html), consisting of documents classified into 70 genre classes. It has been constructed as part of an effort to automate document genre classification as distinct from topic detection. Previously there has been very little work on building corpora of texts which have been classified using a non-topical genre palette. The reason for this is partly due to the fact that genre as a concept, is rooted in philosophy, rhetoric and literature, and highly complex and domain dependent in its interpretation ([11]). The usefulness of genre in everyday information search is only now starting to be recognised and there is no genre classification schema that has been consolidated to have applicable value in this direction. By presenting here our experiences in constructing the KRYS I corpus, we hope to shed light on the information gathering and seeking behaviour and the role of genre in these activities, as well as a way forward for creating a better corpus for testing automated genre classification tasks and the application of these tasks to other domains

Crossref

Enlighten

Formulating representative features with respect to document genre classification

Author: Kim Dr Yunhyong
Ross Seamus
Publication venue
Publication date: 01/01/2008
Field of study

Genre classification (e.g. whether a document is a scientific article or magazine article) is closely bound to the physical and conceptual structure of document as well as the level of depth involved in the text. Hence, it provides a means of ranking documents retrieved by search tools according to metrics other than topical similarity. Moreover, the structural information derived from genre classification can be used to locate target information within the text. In previous studies, the detection of genre classes has been attempted by using some normalised frequency of terms or combinations of terms in the document (here, we are using term as a reference to words, phrases, syntactic units, sentences and paragraphs, as well as other patterns derived from deeper linguistic or semantic analysis). These approaches largely neglect how the term is distributed throughout the document. Here, we report the results of automated experiments based on distributive statistics of words in order to present evidence that term distribution pattern is a better indicator of genre class than term frequency.